Introduction
Using the Spotify dataset provided on Kaggle by Andrii Samoshyn, I will be examining the determination of genre using the given audio feature variables that Spotify uses. As Spotify is widely and globally used, I want to observe and analysis how Spotify is able to determine and assign a genre to a song that is then used to provide recommendations to users. There are over 40000 observations with most being audio feature variables, such as danceability and acousticness.
Data Overview
Data summary
| Name |
spotify |
| Number of rows |
42305 |
| Number of columns |
22 |
| _______________________ |
|
| Column type frequency: |
|
| character |
8 |
| numeric |
14 |
| ________________________ |
|
| Group variables |
None |
Variable type: character
| type |
0 |
1.00 |
14 |
14 |
0 |
1 |
0 |
| id |
0 |
1.00 |
22 |
22 |
0 |
35877 |
0 |
| uri |
0 |
1.00 |
36 |
36 |
0 |
35877 |
0 |
| track_href |
0 |
1.00 |
56 |
56 |
0 |
35877 |
0 |
| analysis_url |
0 |
1.00 |
64 |
64 |
0 |
35877 |
0 |
| genre |
0 |
1.00 |
3 |
15 |
0 |
15 |
0 |
| song_name |
20786 |
0.51 |
1 |
138 |
0 |
15439 |
0 |
| title |
21525 |
0.49 |
4 |
49 |
0 |
132 |
0 |
Variable type: numeric
| danceability |
0 |
1.00 |
0.64 |
0.16 |
0.07 |
0.52 |
0.65 |
0.77 |
0.99 |
▁▂▇▇▃ |
| energy |
0 |
1.00 |
0.76 |
0.18 |
0.00 |
0.63 |
0.80 |
0.92 |
1.00 |
▁▁▃▅▇ |
| key |
0 |
1.00 |
5.37 |
3.67 |
0.00 |
1.00 |
6.00 |
9.00 |
11.00 |
▇▂▃▅▆ |
| loudness |
0 |
1.00 |
-6.47 |
2.94 |
-33.36 |
-8.16 |
-6.23 |
-4.51 |
3.15 |
▁▁▁▇▂ |
| mode |
0 |
1.00 |
0.55 |
0.50 |
0.00 |
0.00 |
1.00 |
1.00 |
1.00 |
▆▁▁▁▇ |
| speechiness |
0 |
1.00 |
0.14 |
0.13 |
0.02 |
0.05 |
0.08 |
0.19 |
0.95 |
▇▂▁▁▁ |
| acousticness |
0 |
1.00 |
0.10 |
0.17 |
0.00 |
0.00 |
0.02 |
0.11 |
0.99 |
▇▁▁▁▁ |
| instrumentalness |
0 |
1.00 |
0.28 |
0.37 |
0.00 |
0.00 |
0.01 |
0.72 |
0.99 |
▇▁▁▁▂ |
| liveness |
0 |
1.00 |
0.21 |
0.18 |
0.01 |
0.10 |
0.14 |
0.29 |
0.99 |
▇▃▁▁▁ |
| valence |
0 |
1.00 |
0.36 |
0.23 |
0.02 |
0.16 |
0.32 |
0.52 |
0.99 |
▇▇▅▃▁ |
| tempo |
0 |
1.00 |
147.47 |
23.84 |
57.97 |
129.93 |
144.97 |
161.46 |
220.29 |
▁▁▇▃▁ |
| duration_ms |
0 |
1.00 |
250865.85 |
102957.71 |
25600.00 |
179840.00 |
224760.00 |
301133.00 |
913052.00 |
▆▇▂▁▁ |
| time_signature |
0 |
1.00 |
3.97 |
0.27 |
1.00 |
4.00 |
4.00 |
4.00 |
5.00 |
▁▁▁▇▁ |
| unnamed_0 |
21525 |
0.49 |
10483.97 |
6052.36 |
0.00 |
5255.75 |
10479.50 |
15709.25 |
20999.00 |
▇▇▇▇▇ |
Upon initial analysis of the dataset, I noticed that there are no missing values for the variables of interest. The outcome variable that I am examining in response to the audio feature variables is ‘genre’. This variable consists of fifteen unique genres, such as trap, hip-hop, underground rap, pop, etc. After first loading the data, I noticed that the majority of the data and its observations fall under the ‘underground rap’ genre. Here I am demonstrating the distributions for the audio feature variables by each genre. Additionally, I examined some potentially interesting relationships between a few of the variables, such as loudness vs. energy and danceability vs. speechiness, to observe if there are any correlations between variables.
Methods
For this dataset, I will be utilizing the following models: decision trees and k-nearest neighbor. The decision tree model can be used to interpret, visualize, and identify which audio feature variables are most important in determining the genre of a song. K-nearest neighbor models are used to classify a new observation by finding the k observations in the training set that are closest to it and assigning it the class most common among those k neighbors. I will be tuning the minimum number samples ‘min_n’ parameter for the decision tree model and the ‘neighbors’ parameter for the k-nearest neighbor model. The recipe I will be using has ‘genre’ against all of the audio feature variables. I will be using repeated V-fold cross-validation in this situation to reduce the variance in the performance estimate of the model so that the evaluation of the model is more reliable. For this project, I will be utilizing the ROC curve and precision values to select the best model.
Model Building & Selection Results
After tuning both the decision tree and knn models using grid search and cross-validation, I was able to achieve improved performance. Overall, I noticed that the knn model had a higher ROC value than the decision tree model. However, it is important to note that the decision tree model had a higher precision value than the knn model, indicating that it was better at correctly predicting the positive cases (i.e. correctly identifying the genre).
Further tuning could be explored in the future, such as adjusting the number of neighbors for the knn model or exploring different splitting criteria for the decision tree model. Additionally, other models could be explored and compared, such as random forests or support vector machines.
In terms of systematic differences in performance between the model types, I observed that the decision tree model had a higher precision value, indicating that it was better at identifying the positive cases (correctly predicting the genre). On the other hand, the knn model had a higher ROC value, indicating that it was better at overall classification performance.
Based on my analysis and comparison of performance metrics, I select the knn model as our final/winning model. While the decision tree model had a higher precision value, I prioritize overall classification performance, which the knn model excelled at with its higher ROC value. It was not particularly surprising that the knn model won, as it is a commonly used classification algorithm that is known for its performance in many scenarios.
# A tibble: 9 × 7
min_n .metric .estimator mean n std_err .config
<int> <chr> <chr> <dbl> <int> <dbl> <chr>
1 2 f_meas macro 0.565 15 0.00311 Preprocessor1_Model1
2 2 precision macro 0.608 15 0.00205 Preprocessor1_Model1
3 2 roc_auc hand_till 0.839 15 0.00104 Preprocessor1_Model1
4 11 f_meas macro 0.565 15 0.00311 Preprocessor1_Model2
5 11 precision macro 0.608 15 0.00205 Preprocessor1_Model2
6 11 roc_auc hand_till 0.839 15 0.00104 Preprocessor1_Model2
7 20 f_meas macro 0.565 15 0.00311 Preprocessor1_Model3
8 20 precision macro 0.608 15 0.00205 Preprocessor1_Model3
9 20 roc_auc hand_till 0.839 15 0.00104 Preprocessor1_Model3
# A tibble: 9 × 7
neighbors .metric .estimator mean n std_err .config
<int> <chr> <chr> <dbl> <int> <dbl> <chr>
1 1 f_meas macro 0.481 15 0.00119 Preprocessor1_Model1
2 1 precision macro 0.477 15 0.00116 Preprocessor1_Model1
3 1 roc_auc hand_till 0.724 15 0.000673 Preprocessor1_Model1
4 10 f_meas macro 0.498 15 0.000912 Preprocessor1_Model2
5 10 precision macro 0.498 15 0.000960 Preprocessor1_Model2
6 10 roc_auc hand_till 0.861 15 0.000629 Preprocessor1_Model2
7 20 f_meas macro 0.502 15 0.00106 Preprocessor1_Model3
8 20 precision macro 0.517 15 0.00152 Preprocessor1_Model3
9 20 roc_auc hand_till 0.887 15 0.000755 Preprocessor1_Model3
# A tibble: 2 × 3
model ROC_AUC se
<chr> <dbl> <dbl>
1 Tree 0.839 0.00104
2 KNN 0.887 0.000755
Final Model Analysis
The final KNN model was selected based on its high ROC AUC score on the testing data. This model was fit to the testing data and performance was assessed with a confusion matrix. The confusion matrix showed that the model correctly predicted genres according to the ‘truth’.
The outcome variable was not transformed in this analysis.
Overall, the KNN model was the best performing model out of the ones tested, but it is important to note that the effort of building a predictive model should always be considered in relation to the payoff. In this case, the model did not significantly outperform a null model, which would simply predict the most frequent genre (rap) every time.
One potential feature of the KNN model that made it the best was its ability to fit nonlinearity well. However, there is still room for further exploration and tuning of the model in future analyses.
# A tibble: 8,462 × 2
genre .pred_class
<chr> <fct>
1 Dark Trap Underground Rap
2 Dark Trap dnb
3 Dark Trap Underground Rap
4 Dark Trap Underground Rap
5 Dark Trap Underground Rap
6 Dark Trap Underground Rap
7 Dark Trap Underground Rap
8 Dark Trap Underground Rap
9 Dark Trap Underground Rap
10 Dark Trap Underground Rap
# … with 8,452 more rows
Truth
Prediction Dark Trap dnb Emo hardstyle Hiphop Pop psytrance Rap RnB
Dark Trap 160 9 25 14 23 5 3 8 21
dnb 28 314 29 2 6 2 0 0 5
Emo 0 0 0 0 0 0 0 0 0
hardstyle 94 127 151 462 18 8 7 6 15
Hiphop 0 0 0 0 0 0 0 0 0
Pop 0 0 0 0 0 0 0 0 0
psytrance 74 85 1 78 0 0 479 1 1
Rap 0 0 0 0 0 0 0 0 0
RnB 0 0 0 0 0 0 0 0 0
techhouse 15 4 18 2 8 9 23 9 10
techno 71 3 1 1 1 1 60 1 0
trance 12 0 2 2 1 0 4 0 0
trap 10 26 9 47 3 3 2 2 1
Trap Metal 0 0 0 0 0 0 0 0 0
Underground Rap 473 43 94 20 544 74 5 349 410
Truth
Prediction techhouse techno trance trap Trap Metal Underground Rap
Dark Trap 1 5 26 5 20 23
dnb 0 0 0 11 17 5
Emo 0 0 0 0 0 0
hardstyle 7 1 149 132 33 24
Hiphop 0 0 0 0 0 0
Pop 0 0 0 0 0 0
psytrance 46 100 258 76 4 5
Rap 0 0 0 0 0 0
RnB 0 0 0 0 0 0
techhouse 363 43 12 7 5 6
techno 77 428 25 8 7 9
trance 0 1 54 2 6 1
trap 1 0 5 248 39 12
Trap Metal 0 0 0 0 0 0
Underground Rap 67 12 28 121 229 1064
Conclusion
In conclusion, my analysis shows that it is possible to predict music genre based on audio features with a reasonable degree of accuracy. Among the various models and tuning strategies explored, the K-nearest neighbors (KNN) algorithm and Euclidean distance metric performed the best in terms of the ROC AUC metric on the testing set. The final model achieved a high ROC AUC score, indicating that it can distinguish between different genres with high accuracy.
It is worth noting that while the KNN model performed well, there is still room for improvement. One potential avenue for future work could be to explore more advanced machine learning algorithms, such as neural networks or gradient boosting, which may be able to capture more complex relationships between the audio features and genre. Additionally, it may be beneficial to consider other features or data sources, such as lyrics or artist information, to further improve the predictive performance.
Overall, the results of this analysis demonstrate the potential of machine learning to automate the genre classification process in music, which can be useful in various applications such as music recommendation systems, playlist curation, and music indexing.
References
https://www.kaggle.com/datasets/mrmorj/dataset-of-songs-in-spotify?resource=download by Andrii Samoshyn